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Abstract. Quantum-mechanical approach yields much better performance compared 
to the classical approach in a wide range of applications. This qualitative improvement 
is enabled by appropriate use of novel quantum effects. We investigate whether this 
paradigm is also valid for machine learning. In particular, the question we address 
here is: Can the machine learning be improved by using favorable quantum effects? 
To answer this question, we compare two controllable circuits, designed to learn a 
task of A-bit Boolean function: One is classical and the other is quantum. In the 
comparison, we highlight the local solution-region of the whole search space, so-called 
acceptable region. It is shown that the acceptable region of the quantum circuit can 
always be wider than that of the classical one, and consequently, it leads to quantum- 
speedup. We clarify that the underlying physics is quantum superposition principle in 
our analysis. To support and to corroborate the analysis, the numerical simulations are 
performed by considering two learning models. First, we consider a primitive model, 
random search, to give a clear understanding of the quantum-speedup enabled by 
extending the acceptable region. Then, we apply more practical learning model, called 
differential evolution, which is known as one of the most efficient learning methods. In 
this model, the quantum circuit exhibits faster learning speed and better convergence 
than the classical circuit. 
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1. Introduction 

Machine learning is known as a sub-field of artificial intelligence and one of the most 
advanced automatic control schemes. "Learning" is usually thought of as a characteristic 
of human or living things, but machine learning enables a machine also to learn a task 
PQ . Machine learning has long been attracting great attention due to its ability to learn 
or to teach. In addition, the machine learning is expected to provide reliable control 
techniques in designing complex systems in physics and engineering [Tj. Recently, the 
machine learning has been applied to a variety of applications in quantum information 
science, e.g., quantum coherence control El IH E], state engineering [HI E], and 
quantum-algorithm design [8]. 

Over the last few decades, quantum physics has brought about remarkable 
innovations in a variety of the field. For instance, there are exponentially fast 
quantum computing algorithms over their classical counterparts [91 [101 [11] . Quantum 
metrology improves the physical limit of measurement precision [121 [T3] • In quantum 
cryptography, many protocols that offer a higher security against an eavesdropper have 
been proposed [HJ EE] • These improvements based on the "quantum" are enabled by 
appropriate use of novel quantum features, such as quantum superposition enabling 
the simultaneous evaluation of input information (so-called "quantum parallelism" ) and 
quantum entanglement engaging the strong correlation. At present, one may ask as 
follows: Can "quantum" improve the performance of machine learning? Several studies 
concerning this question have been made. As an attempt, it was shown that the quantum 
neural networks might be more efficient than their classical counterparts in some cases 
[To] IT71 [T£] . The study of reference [TH] provided an important motivation for extending 
the machine learning to quantum field, along with above question. Recently, quantum- 
inspired evolutionary algorithm was proposed, where the genes were dealt with as the 
qubit-strings [20]. The proposed algorithm requires small populations (which is the set 
of candidate solutions) to get the best solution, compared to that of using the classical 
genes [21] ■ More recently, remarkable studies presented the improvement of the learning 
performance assisted by quantum system [221 E3] However, it is still unclear what and 
how quantum effects works in machine learning. 

In this paper, we consider a machine learning in classical and quantum systems. 
Our goal is to compare the learning in these two systems and to elucidate the role of 
"quantum" in machine learning. To do this, we design two kind of controllable circuits 
to learn a task of iV-bit Boolean function: One is classical circuit which consists of 
only the classical elements, and the other is quantum circuit that contains quantum 
gates in a channel. In the comparison, we investigate a "task-fidelity" to quantify which 
circuit is more favorable to implement a desired function. For an equivalent setting, 
the quantum circuit has wider acceptable region of the approximate target functions on 
the task-fidelity landscape. Consequently, it leads to quantum-speedup. We clarify that 
the underlying physics of the speedup is quantum superposition principle, which is only 
involved in the quantum circuit. By using a Monte-Carlo method, we demonstrate that 
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the acceptable region of the classical circuit is rapidly decreased with increasing the size 
of search space, whereas that of the quantum circuit is not. We perform the numerical 
simulations to support and to corroborate our analysis, by considering two learning 
models: One is a random search, and the other is a differential evolution. Firstly, in 
the random search, we give a clear understanding of the quantum-speedup, enabled by 
extending the acceptable region. Then we consider more practical learning model, called 
differential evolution, which is known as one of the most efficient learning methods [24J . 
In this model, it is shown that the quantum circuit exhibits faster learning speed and 
better convergence than the classical circuit. To our knowledge, this is the first work 
that shows, explicitly, the beneficial role of the quantum superposition in the machine 
learning. 

This paper is organized as follows: In Section [2j we design classical and quantum 
circuit to learn a task of iV-bit Boolean function, and describe the learning process on 
them. Section [3] is devoted to explain the speedup of the quantum circuit. We clarify 
that the speedup comes from quantum superposition involved in the quantum circuit. 
In Section [4], the numerical simulations are performed to support and to corroborate 
our analysis, by considering two learning models. The remarks are given in the last 
Section |U 

2. Machine learning in classical and quantum system 

In this section, we design two controllable circuits to learn a task of iV-bit Boolean 
function, where two circuits are implemented in classical and quantum-mechanical ways. 
Then we describe how these circuits learn a given target function. 

2.1. Typical process of machine learning 

A human (or other living things) behaves, and accumulates experiences during the 
behaviors. The behaviors are progressively changed based on the accumulated 
experiences. The interplay of the "behavior and its change" has generally been regarded 
as an important aspect of a human learning [25]. In machine learning, these two 
processes, behaving and its change, can be realized on a physical system assisted with 
a feedback; A physical system is supposed to learn a given task and the feedback is 
responsible for the learning. As is usual, it is convenient to model the physical system 
by a circuit implementing the function /, corresponding to the task to be learned. Then, 
the behaviors of the circuit are characterized as f:x — > y, where x and y denote input and 
output, respectively. The accumulated data of the inputs and the outputs correspond 
to the experiences in learning. In the circumstance, the process of machine learning is 
simply formulated to find a certain function / that generates desired outputs, based on 
the observations of the inputs and the outputs in supervised learning, and otherwise, 
that of finding some hidden structure within a given set in unsupervised learning. The 
main difference between the supervised and unsupervised learning is that in latter case 
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Figure 1. Computation Device: The computation device consist of 2™ gates controlled 
by input x on the work channel to generate arbitrary reversible boolean function by 
modifying parameters 



the desired output is not known. We consider the case of supervised learning throughout 
this work. 

2.2. Classical and quantum circuit to learn a task of N -bit Boolean function 

As a task to be learned, we consider a Boolean function defined on N input bits, such 
as / : M N B 1 , where B^ = {0,1}^ is a iV-bit Boolean domain. In this sense, we 
call it "iV-bit Boolean function". This function is written as f{x) = y, where the input 
x = %n . . . x 2 Xi G B^ is given by iV-bit strings and the output y is to be or 1. The 
Boolean function / is irreversible in the sense that the original information of input 
x cannot be recovered from the output y. The function / can be expressed by using 
Positive Polarity Reed-Muller expansion [26J, 

2 N -l / \ 

f(x) = a Q © aiXi © a 2 x 2 © a 3 xix 2 © ■ ■ ■ © a 2 N_ 1 xi ■ ■ ■ x N = @ \a k \\ Xi J , (1) 

k=o \ iec k J 

where "©" denotes the modulo-2 addition, "0" means a direct sum of the modulus, and 
the coefficients a k (so-called a Reed-Muller coefficients) are given to be or 1. Here, C k 
is an index set whose elements are given in such a way: Firstly, k is written as iV-binary 
strings k^ . . . k 2 k\. Then, the number j is taken to be an element of C k only if kj is 
equal to 1. Note here that the Reed-Muller coefficients a k e {0, 1} determine uniquely 
a Boolean function / among 2 2N possible functions. 

In order to implement the function / in both classical- and quantum-mechanical 
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way, it is required to design reversible circuit. The Equation ([I]) can realize to the 
reversible circuit depicted in Figure [TJ employing an additional work channel and 
controlled gates [27J EE]- In the circuit, a single-bit gate Go is placed on the work 
channel and (2 N — 1) number of controlled-G^ gates are acted on the work channel 
conditioned that all the control bits {x,i\i G G&} are 1. The input signal c on the work 
channel is fixed to 0. The operation Gk in Figure [I] is given to be either identity (i.e., 
doing nothing) if = or NOT (i.e., flipping an input bit to its complement bit) if 
ak = 1. For instance, 1-bit Boolean function (i.e., N = 1) has 2 2 * = 4 sets of Reed- 
Muller coefficients (ao, cti), which determine all possible Boolean functions. In Table [TJ 
we present the four possible 1-bit Boolean functions with Reed-Muller coefficients and 
gate operations. 
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Table 1. Four possible 1-bit (deterministic) Boolean functions are given with Reed- 
Mullcr coefficients ao,i and gate operations Gq,i- In characterizing these functions, 
the constant input c is set to be 0. These are common in both classical and quantum 
circuits. 



Now let us differentiate between classical and quantum circuits. The classical circuit 
consists of classical channels and gates. Thus, the classical circuit-implementation is 
described as 

(x,c) ^(x,c@y) (2) 

with classical x, y, and c G {0, 1}. Taking into account a probabilistic task, we consider 
that gate operations G^'s are applied probabilistically: G& performs identity and NOT 
operation with a probability pk and 1 — Pk, respectively, which is realized by adopting 
the Reed-Muller coefficients ak probabilistically. The probabilistic task of the classical 
circuit is primarily intended for a fair comparison with the quantum circuit. On the other 
hand, the quantum circuit is realized by replacing the work channel with a quantum 
channel. The input channels are left in classical. The quantum elements in the work 
channel indeed causes the change of the learning efficiency, as shown later. Thus, the 
quantum circuit-implementation is described as 

(x,\c))-U(x,m, (3) 

where the signal on the work channel is encoded into a qubit state. In this circumstance, 
it is appropriate to replace the classical gate operation Gk to the unitary operation, 

-e-^^T^p~k VP~k J ' 
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where pk is the probability of Gk performing identity (i.e., |0) — > |0), |1) — > |1)) and 
1 - p k is that of G fc performing NOT (ie., |0) -e _i ** |1), |1) -»■ e i?ifc |0)) with its 
relative phase (pk- Therefore, such a quantum gate with pk is equivalent to a classical 
gate with corresponding pk except for the relative phase. The probability pk of the gate 
is controllable, even in any quantum experiments ED]- In this sense, we can regard 
the probabilities pk (k = 0,1, ...,2^ — 1) as the control parameters of the reversible 
circuit without any loss of generality. We assume further that the relative phase (pk of 
the quantum circuit is also adjustable. This is crucial in the speedup of the quantum 
circuit, as shown in later. 

2. 3. Leaning of the circuit by maximizing a task-fidelity 

Now, we consider feedback to teach the circuit. A system for feedback is usually 
equipped with a finite memory to record the control parameters and sometimes the 
information required to the learning process, such as inputs and outputs. It also contains 
a learning algorithm as the rule of updating the circuit. The effectiveness and efficiency 
of the learning depend on the chosen learning algorithm. 

Before describing details of the learning, let us consider a conditional probability 
P(y\x), which is the probability that a circuit transforms an input x to an output y. 
A state of circuit is represented by a set P = {P(y\x)} of conditional probabilities for 
all possible input x and output y. In the learning process, we also use a conditional 
probability set P T = {P T (y\x)} of target function as a reference. For example, if the 
circuit is to learn the function f\ in Table [TJ the conditional probability set of target 
function is given as 

P r = {P T (0|0) = 1,P T (1|0) = 0,P r (0|l) = 1,P T (1|1) = 0}. (5) 

Thus, the goal of the learning in this work is that P becomes close to P T . To do this, 
it is required to measure how close P and P T are. In our work, we employ a measure, 
called "task-fidelity" , 

t= (n^) 2 ' (6) 

where F x = a/ ' P{y\x)P T (y\x) is a classical fidelity between the conditional 
probabilities P(y\x) and P T (y\x) for x [31J. In such definition, J 7 becomes close to 
1 only if all F x are close to 1 and otherwise, J 7 is far less than 1. 

In this circumstance, the learning in our circuit is as follows: Firstly, we prepare the 
circuit by choosing the control parameters pk (k = 0, 1, . . . , 2 — 1) initially at random. 
Then, the circuit transforms the inputs x to the outputs y. The system for feedback 
updates the control parameters pk reconfiguring P = {P(y\x)} maximizing the task- 
fidelity for a given set P T = {P T (y\x)} of target function. Basically, the learning of our 
circuit can be described by the repetition of the last two steps, either in the classical or 
in the quantum circuit. If P becomes close to P r with a sufficiently high accuracy, the 
learning is completed. 
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Before closing, we clarify that the quantum circuit can deal with the quantum 
superposition with the quantum gates in the work channel, whereas any entanglement 
is not involved in. Nevertheless, we can expect an improvement of the learning efficiency, 
without any contribution of entanglement [32l [33] . 

3. Analytical analysis 

In the previous section, we designed two kind of circuit to learn a task of iV-bit Boolean 
function. In this section, we try to answer the question of whether the quantum circuit 
can improve the learning efficiency The answer is affirmative. 

The circuit is characterized by the control parameters (po,Pi, ■ ■ ■ ,P2 N -i), 
corresponding to a point in 2 Ar -dimensional search space, in which each point determines 
a state of circuit, P. For example, a circuit for 1-bit Boolean function is characterized by 
a point on the space (po,Pi) with its two control parameters pu (k = 0, 1). In this case, 
the circuit of the four possible (deterministic) functions fj (j = 1, 2, 3, 4) are represented 
by the points on the search space: Pi(l, 1), P2(0, 1), Ps(l, 0), and P^O, 0). Considering 
the probabilistic task, a point (po,Pi) characterizing the circuit can be placed inside or 
on the square of P!P 2 P4P3. 

In a realistic circumstance, it is usually impractical to learn exact solution. Thus, 
finding approximate solutions near to the exact one is more appropriate [Tj. Also in 
this work, finding approximate target function is more reasonable rather than finding 
the exact target function, i.e., the state of circuit P does not have to exactly be the 
target P T to complete a learning. We can therefore consider a local solution-region of 
whole search space surrounding the point of the target function. We call this region 
an "acceptable region" of the approximate target functions. The acceptable region 
is localized by a given tolerance level of accuracy J"c,q > 1 — e with e <C 1. Then, 
the learning is reformulated as a search for an approximate target function inside the 
acceptable region. Therefore, the learning time and the convergence depend primarily 
on the size of the acceptable region; it is expected that larger acceptable region would 
make the learning faster, although this is not always the case |24j . 

Then, we discuss an improvement of learning efficiency by using the quantum 
circuit. We prove that the acceptable region on the task-fidelity landscape of the 
quantum circuit can always be larger than that of the classical circuit. To do this, 
let us consider a simple example of learning a 1-bit Boolean function fx in Table [Tj 
Then, the conditional probability set P T of the target function is given by Equation ([5]), 
and the acceptable region of the approximate solutions is localized near the point Pi for 
a given tolerance level of accuracy. We shall now investigate the acceptable region of 
the classical and the quantum circuits. Firstly, we calculate the task-fidelity T between 
an arbitrary 1-bit Boolean function and target function fx, given by 
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Applying the conditional probability set of target function in Equation ([5]), task-fidelity 
of Equation <Q is reduced to 



P(0|0)P(0|1) 



(8) 



which is common to both classical and quantum circuits. For the case of the classical 
circuit, the conditional probabilities are given as 

Pc(0|0)=poPi+R>(l-Pi)=Po, (9) 
Pc(0|1)=PqPi + (1-Po)(1-Pi)- (10) 
Then, we have the task-fidelity J-q for the classical circuit, 



Po-Po(po+Pi) + 2poPi 

On the other hand, in the case of the quantum circuit, the conditional probabilities are 
given as 



P Q (0|0)= (0|Go|0> =P c (0|0) 



(12) 



P Q (0|1)= (OldGolO) =P c (0|l)-2 v / PoPi(l-Po)(l-Pi)cosA, (13) 



where A = <fi\ — <f) . Note here that the last term in Equation (13) appears due to the 
relative phases in Go and 0%. The task-fidelity Tq, for the quantum circuit is written as 

(14) 



Q 



X-4 
•J n 



2p a/pqPi(1 -po)(l - Pi) cos A 



Here, by observing Equation (11) and (14), it is easily found that J-q is greater than 
J-c when cos A < 0, equal to Fq when cos A = 0, and smaller than Tq when cos A > 0. 
These are apparently the results of quantum interference by consecutive quantum gates 
in the quantum circuit. While the second case reveals no quantum interference, the first 
and last case show the constructive and destructive interferences, respectively. Thus, 
the learning in the quantum circuit provides such quantum interferences, so that we 
can employ one of them favorably by changing <f) and <\>\. The result of constructive 
interference, i.e., the increment of the task-fidelity, implies speedup of the learning. To 
see this more clearly, we consider the acceptable region of the approximate solutions 
in the search space of (p , pi). In Figure |2j we present the acceptable region for (a) 
J-c > 0.95 and (b) > 0.95. The blue-line is made by the points of J-"c,q = 0.95. 
To maximize the interference effects, we set A = 7r (i.e., cos A = —1). It is directly 
seen that the acceptable region of J-q > 0.95 is larger about 5.6 times than that of 
J~c > 0.95. Therefore, learning in the quantum circuit is expected to be faster than 
that in the classical circuit; i.e., quantum-speedup. Here we clarify that the quantum- 
speedup comes from quantum superposition involved in the quantum gates. 

Our analysis can be generalized to the learning for an arbitrary iV-bit Boolean 
function for N > 2. The improvement of the learning efficiency would be enabled by 
extending the acceptable region of the approximate solutions in the 2 7V -dimensional 
search space of (po,Pi, ■ ■ ■ ,P2 N -i)- We firstly consider the task-fidelities Pc.q for iV-bit 
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(b) Quantum circuit 

Figure 2. Task- fidelity landscapes of (a) the classical and (b) the quantum circuits for 
1-bit Boolean function. For the given two circuits, we indicate the acceptable region 
of the approximate target functions. Inside the blue-line denotes the acceptable region 
localized by J^cq > 0.95. It is directly shown that the region of Fq > 0.95 are larger 
than that of Fq > 0.95: about 5.6 times. 



boolean function. The constant function that always gives as an outcome is chosen as 
a target function, and the conditional probability set of that is given as 

P r = {P T (0\x) = 1, P r (l|x) =0\xE M N } . (15) 

Then, the task-fidelities and J-q are simply reduced into 
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where the symbol |.| denotes norm of complex number, and the set A x contains all 
index of gate operators activated by input x. For example, if x = 2 (or 10 in binary 
representation), G and G 2 is activated, so A x=2 = {0,2}. In order to maximize 
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quantum-speedup, we optimize 2^ number of phases (fik (k = 0, 1, 2, . . . , 2 N — 1). 
For simplicity of calculation, we assume isotropic changes of all the parameters from 
the solution. Such assumption implies that the fidelity function varies equally as 
parameter vector changes to any direction from the solution. So that we can choose one 
of any direction from the solution, simply, we assume all parameters to change equally 
from the solution. This implies all parameters to be same with each others, since 
solution of learning target is Equation (15). Note here that, the product of arbitrary 
two unitaries, GkGk' or Gk'Gk, is identity when the difference of their phases, \4>k — 4>k'\i 
is 7r. In the meantime, for a given x except x = 0, the number of elements in the set 
is always even: Half of elements in the set contains even number of l's in their binary 
representation, and the other half contains odd number of that. Thus, the number of 
Gk(k G A x ) having 0-phase and 7r-phase are always same whatever x is (except 0), if 
the phases are given by 

N 

if kj is even, 

j ll (18) 
if kj is odd, 







7T 



where kj G {0,1} is j-th number in binary strings of k — k^-'-k^ki. This implies 
llfceAa, &k in Equation (17) to be always identity except the case of x — 0, because all 
the pairwise gate operators in sequence have 7r-phase difference, and the product of two 
gate operators with 7r-phase difference is identity. Here, let F x be || (0| rifceA &k |0) || 

,1/2" 



term in Equation ([17). Then, Equation (17) cannot be grater than Fq 1 ^ 2 = po 1 
since F x < 1 for all x ^ 0. As we saw before, Equation (18) result in all F x (x ^ 0) 
to be 1, so that it leads Equation (17) to be maximized, as well. Note that actual 
task-fidelity J-q does not have isotropy geometry, against our assumption. However, we 
focus on acceptable region near by the solution. All parameters approximately same in 
this region, and Equation (18) produces empirically good result. Therefore, we use this 
phases for learning of A-bit Boolean function. 

Then, we compare the acceptable regions of the two circuits based on the increment 
of task-fidelity for 1-bit Boolean function fx in Table [Tj To see this more clearly, we 
investigate the acceptable regions by using a Monte-Carlo method. In Table [2j we 
present the ratio 7 of the acceptable region of J-"c,q > 0.95 to the whole search space 
with respect to the size of the search space D = 2 1 , 2 2 , and 2 3 . The result shows that 
the acceptable region of the classical circuit is reduced much more rapidly than that of 
the quantum circuit with increasing D. Therefore, we predict that the learning in the 
quantum circuit becomes much more faster than that in the classical circuit in a large 
search space. Here we stress again that quantum superposition is at the heart of the 
quantum-speedup. 
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Table 2. The ratio 7c, q of acceptable region to whole search space is given with 
respect to the size of the search space D = 2 N . The accuracy limit is given by the 
condition Tc Q > 0.95. It is observed that 7 for the classical circuit is reduced much 
more rapidly than that for the quantum circuit with increasing D. 



4. Numerical analysis 

In this section, numerical simulations are performed to support and corroborate our 
analysis. The simulations are carried out by considering two learning models: One 
is a random search [34], and the other is a differential evolution [24J. The random 
search is a primitive learning model, which is usually used for investigating the learning 
properties rather than for any practical purposes. This model is applied to give a clear 
understanding of the quantum-speedup enabled by extending the acceptable region. 
Then, we apply the differential evolution, which is more practical learning model that 
belongs to the category of evolutionary learning [24J. 



4-1. Random search 

A random search model is very simply implemented: First, all 2 N control parameters pk 
are randomly chosen. Then, task-fidelity is measured to evaluate how well the circuit 
performs the target function with the chosen set {pk}- These two steps are thought of 
as a single iteration of the learning procedure. The iterations are repeated until the 
task-fidelity of the chosen set {pk} satisfies J^cq > 1 - e (e < 1). In this case, the 
probability of completing learning is simply that of a randomly generated set {pk} to be 
inside the acceptable region. This probability is equal to 7. Then, we consider a learning 
probability P{n), which is defined as a probability that learning is completed before or 
at the number of iteration, n [8]. The learning probability P(n) is approximately given 
as follows: 

n-l 

Pin) = J2 7(1 - l) 3 = 1 - (1 - T )« ~ 1 - e"^, (19) 
3=0 

where we used 1 — 7 ~ e -7 , discarding the order of 0(7 2 ) — > in the limit of high- 



accuracy e — > 0. Note that P(n) of Equation (19) is given in terms of a cumulative 
distribution function of n, and I/7 characterizes how many iterations are needed for the 
completion of learning. Thus, the learning efficiency is characterized thoroughly by the 
acceptable region in this learning model. 

The numerical simulations are done with increasing the input size N of Boolean 
function from 1 to 3. Thus, the size of the search space D exponentially grows from 
2 1 to 2 3 . In the simulations, we set the tolerance of task-fidelity with e = 0.05; i.e., 
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Figure 3. Learning probability P(n) is drawn for D = 2 1 ,2 2 ,2 3 . The data is made 
by sampling 4000 simulations (400 simulations only for the 3-bit classical circuit). The 
data are well fitted to the function of P(n) = 1 — c~"/" c (red solid-lines) adopted 
by Equation (19 1. We obtain the characteristic constant n c : For D = 2 1 ,2 2 ,2 3 , 
n c ~ 1.02 x 10 2 , 1.36 x 10 4 , and 4.67 x 10 8 for the classical circuit (CC, dashed- 
lines), and n c ~ 1.78 x 10 1 , 2.58 x 10 1 , and 5.36 x 10 1 for the quantum circuit (QC, 
dotted-lines) . The characteristic constant n c for CC is much smaller than that for QC. 



learning is terminated when J^cq > 0.95. For convenience, we set the target to be a 
constant function that generates the output for all input x. In Figure [3j we present 
the graph of the learning probability P(n) for N = 1,2, and 3. The data are made from 
the sampling of 4000 simulations (400 simulations only for the N = 3 classical circuit). 
They are well fitted to the function P{n) = 1 — e~ n /" c adopted by Equation (19), where 
n c is a "characteristic constant" of the learning efficiency We obtain the characteristic 
constants for N = 1,2, and 3 as n c ~ 1.02 x 10 2 , 1.36 x 10 4 , and 4.67 x 10 8 for the 
classical circuit, and n c ~ 1.78 x 10 1 , 2.58 x 10 1 , and 5.36 x 10 1 for the quantum circuit. 
These results are in good agreement with I/7 observed in Table |2} Note here that we 
can directly see the quantum-speedup with much smaller n c in the quantum circuit. 



4-2. Differential evolution 

A general analysis of the learning efficiency is very complicated as too many factors 
are associated with the learning behavior. Further, most efficient learning algorithms 
tend to use the heuristic rules and are problem-specific [351 EE]- Nevertheless, it has 
been believed that the acceptable region is a key factor of the learning efficiency in a 
heuristic manner [37]. In this sense, we conjecture that the quantum circuit would offer 
a higher learning efficiency than the classical one in such a practical learning model. 
Here, we consider a differential evolution, which is known as one of the most efficient 
learning methods for the global optimization [21]. The differential evolution follows: 
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Figure 4. Average task-fidelity .Fc,q versus iteration n graph. The simulations are 
performed with increasing N from 1 to 7. In the simulations, we set the target to be 
the constant function, same as above, and take M = 50. W is fixed to 0.4 for all N 
and R is chosen between 0.8 and 0.9 for each N . The data are averaged over 1000 
simulations. As the iteration n goes on, average task-fidelity for the quantum circuit 
(QC) approaches to ~ 1 (dashed, colored lines), faster than that for the classical circuit 
(CC) (solid, colored lines). 



To begin, we randomly prepare M sets of control parameters, {pk}i (i = 1, 2, ... , M). 
Here, it is convenient to describe the control parameter set as 2 N - dimensional real vector 
Pi = (pi,p2, ■ ■ ■ ,P2 N -i), an d we call Pi a control parameter vector. [D.l] We generate 
M mutant vectors rrij according to 

mi=p a + W(p b -p c ), (20) 

where we randomly choose mutually different a, b, and c among M control parameter 
vectors. A free parameter W is called a differential weight. [D.2] After that, the control 
parameter vectors PiS are reformed to the trial vectors t{ by the following rule: For 
each k, 

tk,i <- Pk,i if r k > R or k = s, 
tk,i Wfc,i otherwise, 

where pk,i and rrik,i represent k-th component of vector £j, pi and rrii, respectively, 
and rk G [0, 1] and s G {1,2,. ..,M} are a randomly generated number and the crossover 
rate R is another free parameter we choose in [0, 1]. [D.3] lastly, the trial vector tj is 
taken to be p/ for the next iteration if it yields a larger task-fidelity than Pi, and 
otherwise p^ is retained and taken to be p/. The steps [D.1]-[D.3] are repeated as a 
single iteration of the learning procedures. The learning is terminated when a parameter 
vector Pi of ^cq > 0.95 is found. 

The numerical simulations are performed with increasing iV from 1 to 9. In the 
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Figure 5. We present the characteristic constant n c with respect to the size of search 
space D. The condition to complete learning is given by the tolerance level of accuracy 
■7"C,Q < 0.95. The data are made by averaging over 1000 simulations. The data are 
well fitted to n c ~ aD 13 with a ~ 3.82, /3 ~ 0.97 in the classical circuit (CC), and with 
a ~ 1.61, /3 ~ 0.80 in the quantum circuit (QC). 

simulations, we set the target to be the constant function, same as in the previous sub- 
section, and take M = 50 for all N. Free parameters W and R are chosen to achieve 
the best learning efficiency for each circuit: W is fixed to ~ 0.4 and R is chosen between 
0.8 and 0.9 for all N. In Figure |4j we present the average task-fidelity J-"c,q over M. 
The data are averaged over 1000 trials of simulation. In both two circuits, classical and 
quantum, the task-fidelity is increased toward ~ 1 as the iteration goes on. However, it 
is observed that learning in the quantum circuit is much faster than that in the classical 
one for all the cases of N. 

We also investigate the characteristic constant n c with respect to the size of search 
space D = 2 . The condition to complete the learning is given at the accuracy level 
•7~c,q > 0.95. In Figure [5j the graph of n c versus D is given. The data are made by 
averaging over 1000 simulations. It is straightforwardly seen that the classical circuit 
requires much more iterations to complete learning than quantum one. The gap of the 
number of required iterations between classical and quantum circuits becomes larger 
with increasing D. The data are well fitted to n c ~ aD 13 with a ~ 3.82, ~ 0.97 for 
the classical circuit, and with a ~ 1.61, ~ 0.80 for the quantum circuit [J] Note here 
that the fitting parameter /3 for the quantum circuit is smaller than that of the classical 
one. This implies that the convergence of the classical circuit deteriorates much faster 
than quantum one, i.e., the quantum circuit exhibits better convergence. 

| Such polynomial result shows much improvement of optimization in DE against the random search 
which gives rise to exponential result (I/7 in Table |2|). 
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5. Remarks 

We investigated as to whether a quantum system can improve machine learning 
performance. For this, we considered classical and quantum circuits to learn a task 
of iV-bit Boolean function. These two circuits were designed to use same computational 
resources, e.g., number of parameters and memory in this work. The only difference 
between two circuits was whether circuit involves "quantum" or not. Comparing the 
learning in these two circuits, it was shown that, for equivalent settings, the acceptable 
region on the task-fidelity landscape of the quantum circuit can be wider than that of 
classical one, and consequently, it leaded the quantum-speedup. We clarified that the 
underlying physics is quantum superposition principle in our analysis. To support and 
to corroborate the analysis, the numerical simulations were performed, by considering 
two learning models. In a random search, the quantum speedup was explicitly observed 
as the result of extended acceptable region by the quantum superposition, and in a 
differential evolution, it is shown that the quantum circuit exhibits faster learning speed 
and better convergence in finding approximate target functions than the classical circuit. 
We expect that our work motivates researchers to study the role of various quantum 
effects in machine learning, and open up new possibilities to improve the machine 
learning performance. 
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